Statistical learning final project: exploratory data analysis

Daniel A.

UID: 100444499

Importing libraries, data and setting options

Helper Functions

Variables

Target Variable: HDI as a category (very high, high, medium, low)

Method of calculation (from wikipedia):

image

Plots per variable

Foreign Investment Inflows

There's several extreme outliers for GDP, not only in general but also per group and most values concentrate around a specific range. The highest gdp countries

Our main, general outliers are the following:

Exports as a percentage of GDP

Inflation

Years of compulsory education

Education budget as a percentage of GDP

Gross domestic savings as a percentage of GDP

International tourism arrivals

International tourism receipts

Percentage of the population which use the internet

Access to electricity as percentage of population

Percentage of agricultural land

Crude birth rate

Gross national expenditure as a percentage of GDP

Mobile Cellular subscriptions per 100 people

Infant mortality rate

Sex ratio at birth

Greenhouse gas emissions

Percentage of urban population

Correlation plots

General correlation plot (pearson)

Correlation matrix plot segregated by HDI

Calculating (max) correlation using multiple methods